Cardiac arrhythmias classification using photoplethysmography database

Worldwide, Cardiovascular Diseases (CVDs) are the leading cause of death. Patients at high cardiovascular risk require long-term follow-up for early CVDs detection. Generally, cardiac arrhythmia detection through the electrocardiogram (ECG) signal has been the basis of many studies. This technique does not provide sufficient information in addition to a high false alarm potential. In addition, the electrodes used to record the ECG signal are not suitable for long-term monitoring. Recently, the photoplethysmogram (PPG) signal has attracted great interest among scientists as it provides a non-invasive, inexpensive, and convenient source of information related to cardiac activity. In this paper, the PPG signal (online database Physio Net Challenge 2015) is used to classify different cardiac arrhythmias, namely, tachycardia, bradycardia, ventricular tachycardia, and ventricular flutter/fibrillation. The PPG signals are pre-processed and analyzed utilizing various signal-processing techniques to eliminate noise and artifacts, which forms a stage of signal preparation prior to the feature extraction process. A set of 41 PPG features is used for cardiac arrhythmias' classification through the application of four machine-learning techniques, namely, Decision Trees (DT), Support Vector Machines (SVM), K-Nearest Neighbors (KNNs), and Ensembles. Principal Component Analysis (PCA) technique is used for dimensionality reduction and feature extraction while preserving the most important information in the data. The results show a high-throughput evaluation with an accuracy of 98.4% for the KNN technique with a sensitivity of 98.3%, 95%, 96.8%, and 99.7% for bradycardia, tachycardia, ventricular flutter/fibrillation, and ventricular tachycardia, respectively. The outcomes of this work provide a tool to correlate the properties of the PPG signal with cardiac arrhythmias and thus the early diagnosis and treatment of CVDs.

et al. provided a beat classification system using discrete wavelet transform utilizing ECG signal 9 .They achieved a 99.5% classification accuracy for six types of ECG rhythms utilizing an NN classifier with 11 features.Acharya et al. proposed a computer-aided diagnostic system to automatically detect different ECG segments using the convolutional neural network (CNN) technique with an accuracy of 92.5% 10 .Kiranyaz et al. presented an ECGbased classification system with the MIT-BIH database using convolutional neural networks 11 .They claim that their system has achieved a superior classification performance for the detection of ventricular ectopic beats and supraventricular ectopic beats with 99% accuracy.Zhang and Luo developed a multi-lead fused classification scheme to improve the arrhythmias classification using the MIT-BIH database 12 .The classification accuracy was 87.8% with sensitivity for normal, supraventricular ectopic, ventricular ectopic, and fusion of ventricular of 88.6%, 74.3%, 88%, and 73.4%, respectively.Desai et al. presented a study on the utilization of ECG signal in the classification of cardiac arrhythmias, namely; Atrial Fibrillation (A-Fib), Atrial Flutter (AFL), Ventricular Fibrillation (V-Fib), and Normal Sinus Rhythm (NSR) using ensemble classifiers 13 .The results demonstrate that the Rotation Forest (ROF) ensemble classifier achieves the highest accuracy of 98.37% compared to Decision Tree (DT), Random Forest (RAF).Li et al. discusses multi labels classification of heart arrhythmias utilizing the highly dimensional ECG featuers 14 .They proposed a design of a criterion of ECG features based on kernelized fuzzy rough sets for optimal feature subset and subsequently, to enhance the precision of arrhythmia identification.Six metrics were utilized to evluate the performance of the proposed technique.
However the huge number of research that has been performed using the ECG signal for arrhythmias detection, it is subjected to a high likelihood of false alarms as well as the complex electrode connections, making it an undesirable technique for long-term arrhythmia monitoring.Alternatively, Photoplethysmography (PPG) has emerged as a non-invasive, simple, inexpensive, affordable, and convenient technique capable of monitoring hemodynamic parameters.The PPG signal is measured with the optical technique and reflects the volume of blood change in the arteries 15,16 .Additionally, due to the detection principle of the PPG signal in utilizing the optical technique, it is suitable and widely accepted for integration with wearable devices.Thereby, it is considered the preferable choice as an alternative to the ECG technique for cardiac arrhythmias monitoring 17,18 .Limited research has been performed for multiple arrhythmias detection using the MIMIC/PhysioNet online PPG signal database.Chong et al. proposed an arrhythmia classification technique using the PPG signal for smartphones.Four arrhythmias were considered, normal sinus rhythm (NSR), atrial fibrillation (AF), premature ventricular contractions (PVC), and premature atrial contraction (PAC).The results showed that the system could classify the PAC and PVC with an accuracy of 97% and 100%, respectively.Sološenko et al.Proposed a PPG-based system for the detection of bradycardia and tachycardia arrhythmias using a dual-branch convolutional neural network (CNN) 19 .The PhysioNet/CinC 2015 Challenge Database was used as input for the performance evaluation of the system.The results showed that the system could detect bradycardia and tachycardia arrhythmias with an accuracy of 98% and 77%, respectively.In another article, Sološenko et al. presented a method for detecting premature ventricular contractions (PVCs) using the PhysioNet PPG signal databases for training and testing 20 .The artificial neural network was used as a classifier with an accuracy and sensitivity of 92% and 93%, respectively.Cheng et al. developed a machine-learning model for atrial fibrillation detection using PPG signals from online databases 21 .They utilized deep learning techniques with time-frequency analysis.The results showed that the developed system is able to detect atrial fibrillation with an accuracy of 98%.Feed-forward artificial neural network was utilized by Neha et al. to classify four different types of arrhythmias (i.e.normal, PVC, atrial flutter (AFI), and Sinus Tachycardia (ST)) 22 .The feature extraction process was performed utilizing the dynamic time warping technique applied to the PPG signal from the PhysioNet MIMIC-II database.The results showed that the proposed model could classify multiple types of arrhythmias with 96% accuracy.In another work, Neha et al. utilized a set of morphological features for multiple arrhythmias classification with a statistical learning-based approach 23 .The results showed that the proposed detection technique has 94% detection accuracy.Clifford designed an algorithm to estimate the heart rate and detect PVC on the PhysioNet MIMIC PPG signal database 24 .With two morphological features, they were able to a PVC detection accuracy of 99%.Although some previous studies have shown good performance, they are not exhaustive in that they are limited to one or two types of arrhythmias, restricted to one classification technique, or hindered by inter-waveform alignment problems.Additionally, the features used in the detection stage were limited to the utilization of either time-domain or frequency-domain features.This study exclusively utilized the PPG signal to identify arrhythmias.The detection stage employed a variety of features, including morphological, statistical, and frequency domain features, to investigate four types of cardiac arrhythmias, namely bradycardia, tachycardia, ventricular fibrillation or flutter (VFB), and ventricular tachycardia (VTA).The classification performance was assessed in conjunction with significant features and employing various machine learning techniques, including Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Decision Trees (DT), and Ensemble techniques.This research contributes to the advancement of portable and wearable devices that can be used for the diagnosis and monitoring of various types of cardiac arrhythmias 25 .

Materials and methods
In this research, the PPG waveforms obtained from publicly accessible databases (specifically, PhysioNet/Computing in Cardiology Challenge 2015) were utilized to train, validate, and test classification models.These databases contain over 750 raw waveforms from bedside monitors in intensive care units (ICUs), each consisting of two ECG leads and at least one pulsatile waveform of either PPG or arterial blood pressure (ABP) waveform for a duration of 5 min.Each waveform is associated with an alarm that indicates either a true or false arrhythmia event.For this study, only PPG waveforms with true alarms were considered.The PPG signals utilized in this study are raw signals, imitating the dynamic scenarios of signal acquisition.The preprocessing step was used to prepare the data for feature extraction as shown in Fig. 2. The preprocessing includes filtration of the signal utilizing a bandpass filter between (0.05 Hz to 30 Hz).Furthermore, the signal smoothing using moving average filter, then, the baseline wandering was removed using wavelet transform followed with signal normalization.Each waveform was segmented into 10-s intervals.The total number of arrhythmia cases was 55, comprising 12 cases of Tachycardia, 20 cases of VT, 5 cases of VF, and 18 cases of Bradycardia, in addition to healthy cases.The data was split using the cross-validation technique, and the k-value was set to 10.This k-value is very common in the field of machine learning, as it has been shown to give a low test-error rate with less bias and small variance.The arrhythmias investigated in this study were bradycardia, tachycardia, ventricular fibrillation or flutter (VFB), and ventricular tachycardia (VTA).Figure 1 shows an example of a raw PPG signal.The PPG waveforms were obtained and subsequently subjected to offline analysis and pre-processing.The initial processing steps encompassed filtration, smoothing, wavelet transformation (to eliminate baseline wandering), and normalization.Figure 2 illustrates a block diagram of the pre-processing procedure executed on the PPG waveforms.
The preprocessing stage was succeeded by the feature extraction process, which aimed to analyze the PPG signals.Feature extraction is a crucial process that is necessary to identify the significant features in the signals that are pertinent to the conditions and arrhythmias being investigated.Prior to feature extraction, peaks and valleys detection, notch localization (i.e., identifying the start and end of the notch), and segmentation were carried out for each signal.The detection of valleys in the PPG signal aids in determining the start and end of each pulse, which is a critical step in segmentation (i.e., the process that precedes feature determination for each segment, as depicted in Fig. 3a).The segmentation process is accomplished by dividing the signals into their corresponding pulses by utilizing the valleys (as illustrated in Fig. 3b).
The PPG signal is capable of encoding information pertaining to cardiac arrhythmias, given that these arrhythmias have an impact on the characteristics of the PPG waveform.This study involved the extraction of 41 features from the PPG signal, encompassing both morphological and frequency domain features.The mean and standard deviation were also calculated for all features.Figure 4 provides a visual representation of the principles underlying the morphological features and Table 1 shows example of some PPG features utilized in this study.
The phase of feature extraction is the most critical aspect of the classification process, as the reliability and precision of this step are pivotal in achieving optimal classifier performance 26 .It is imperative to identify the most representative and significant set of features to ensure robustness and accuracy.Subsequently, the classification stage is executed using Machine Learning (ML) techniques.
In the classification of cardiac arrhythmias, preprocessed PPG waveforms obtained from the online database, PhysioNet Challenge 2015, were utilized to extract features that represent the embedded information of each waveform.These features were then employed in various classification models to categorize different types of cardiac arrhythmias.The models were optimized to achieve high classification accuracy.
Various metrics were employed to assess the effectiveness of the classifiers for cardiac arrhythmias, including accuracy, sensitivity, specificity, and precision where:  • Accuracy refers to the proportion of correct predictions to the total number of predictions made.• The concept of specificity pertains to the accurate identification of negative outcomes.It is calculated by dividing the number of true negative predictions by the total number of negative outcomes.
• Precision refers to the proportion of accurate positive predictions to the overall number of positive predic- tions.
In the context of predictive modeling, TP denotes the count of accurate predictions made by the model for the positive class, while TN signifies the count of accurate predictions made for the negative class.Conversely, FP refers to the count of inaccurate predictions made for the positive class, and FN refers to the count of inaccurate predictions made for the negative class.

Ethical approval
The authors affirm that all activities carried out in research involving human subjects were conducted in compliance with the ethical guidelines set forth by the institutional and/or national research committee, as well as the 1964 Helsinki Declaration and its subsequent revisions or equivalent ethical standards.Furthermore, the authors assert that no animals were utilized in the study.

Results and discussions
The features were employed to classify various arrhythmias, such as Bradycardia, Tachycardia, Ventricular Tachycardia (VT), and Ventricular Fibrillation (VF).The outcomes of heart arrhythmias classification, based on 41 features as an input vector, are presented in Table 2.It is noteworthy to mention that several studies [27][28][29][30][31][32] have utilized the PhysioNet challenge database to explore and mitigate erroneous arrhythmia alerts during uninterrupted patient surveillance.Upon examination of the True Positive rate for each class in terms of classification sensitivities, the outcomes were found to be consistent with those documented in previous literature [27][28][29][30][31][32][33] .
The results indicate that the Ensemble and KNN models exhibited the most efficient classification performance, achieving accuracies of 97.4% and 97.3%, respectively, among all the machine-learning techniques employed.It is noteworthy that the Ensemble results in overall classification algorithms demonstrate an enhanced prediction performance compared to any individual algorithm.The sensitivity of bradycardia was high across all machine learning models, accurately classifying instances of bradycardia.This can be attributed to the straightforward nature of certain features related to pulse interval values in bradycardia.
In SVM, KNN, and Ensemble, there exists an inverse correlation between the sensitivity and specificity for VT, VF, and Tachycardia.The sensitivity for VF and Tachycardia is low due to the dominance of the positive  www.nature.com/scientificreports/class, while the specificity for VT is low, indicating a low true negative value and a high false negative value.The high sensitivity of VT is attributed to the large number of VT samples in the dataset.Sensitivity and precision are interrelated as they are both utilized in the calculation of true positives.However, precision is higher as a false positive is lower than a false negative, whereas sensitivity depends on the false negative.
When evaluating the sensitivity of each class, it was found that the classification outcomes achieved through the use of just the PPG signal were superior to those reported in previous studies [27][28][29][30][31][32] , particularly for VT and VF, as demonstrated in Table 3.In a study conducted by the authors of reference 33 , the same Physio Net Challenge 2015 database was utilized to classify cardiac arrhythmias using the PPG signal, resulting in an overall accuracy of 93%, which is lower than the accuracy achieved in this study.As previously mentioned, the exclusive use of the PPG signal has the potential to significantly enhance the detection and monitoring of cardiac arrhythmias, taking advantage of its noninvasive nature.Table 3 provides a summary of the results obtained in the literature for the classification of cardiac arrhythmias using bio-signals, in comparison to the findings of this study.
The Principal Component Analysis (PCA) technique was used to reduce the dimension of the input vector and evaluate the accuracy of different machine learning algorithms versus the number of PCA components.Figure 5 shows the accuracy of different classification techniques as a function of the number of PCA components.The results show that the accuracy did not deteriorate significantly when the number of features in the input vector of the classifier was reduced.This could be beneficial in reducing the number of features without significantly affecting accuracy.The DT algorithm achieved the highest accuracy of 92.5% with only seven PCA components.Conversely, the SVM and KNN algorithms achieved their highest accuracy of 96.4% and 96%, respectively, using twenty-one (for the SVM) and eight PCA components (for the KNN).The Ensemble technique achieved its highest accuracy of 95.3% with nineteen PCA components.Increasing the number of PCA components beyond the specified number did not result in any performance improvement for the classification techniques.
It can be observed that, with the exception of the DT algorithm (both pre-and post-PCA), all classification techniques may be deemed suitable for constructing the classification model.The DT algorithm's inferior accuracy may be attributed to the potential for overfitting, resulting in a complex model that fails to generalize data effectively.Additionally, if the number of waveforms is not evenly distributed across all classes, the DT algorithm may produce a biased model.
The process of selecting features was carried out through the utilization of PCA on the complete datasets.The purpose of the PCA was to examine the level of correlation between each extracted feature and the first principal components, identify the most significant features, and eliminate the irrelevant ones.The 41 features  www.nature.com/scientificreports/ of each signal were transformed into a new set of variables, referred to as principal components, which are uncorrelated and arranged in order of significance, with the first component retaining the highest explained variance among all the original features.The explained variance denotes the information explained by a specific principal component.The process of selecting features was carried out iteratively, as previously described.A summary of the performance of four machine-learning techniques, as the number of features increased, is presented in Fig. 6.The results indicate that the selected features have marginally enhanced the classification performance.Given the similarities between Tachycardia, VF, and VT, it was necessary to exploit the dependent features for each of them by utilizing more features.The classification outcomes and the optimal features selected are summarized in Table 4.
The DT technique achieved a maximum accuracy of 94% and maintained a steady accuracy of 93.1%.On the other hand, other machine learning algorithms demonstrated high classification accuracy, with KNN achieving 98.4%, and Ensemble and SVM achieving both 98%.Figures 6 and 7 indicate that the accuracies and sensitivities of certain features increased while others decreased, which may be attributed to the hyper-parameters of the models.The inclusion of redundant features can lead to overfitting and mislead the algorithm's modeling.Additionally, the sensitivity of the classification, particularly for Tachycardia, can be affected by the inclusion of subjects with multi-arrhythmias.
Based on the findings presented in Fig. 7, it is evident that VT and bradycardia exhibit the highest sensitivity in comparison to other arrhythmias.This observation can be attributed to the larger number of VT samples as compared to Tachycardia and VF.It is noteworthy that the sensitivity, specificity, and precision of arrhythmias remain unaffected by data reduction, as they are influenced by data balancing, which ultimately determines the overall accuracy.The Ensemble classification, utilizing the same selected features as KNN and SVM, achieved an accuracy of 97.3%.Notably, there was no significant difference in accuracy between the fifteen features and thirteen features, with the latter achieving an accuracy of 98%.
The processing time of the training process is contingent upon the intricacy of the problem at hand and the complexity of the machine-learning algorithm employed.The training time for the DT, KNN, and SVM algorithms was found to be comparable.In contrast, the Ensemble technique involves the amalgamation of  Consequently, the computational complexity of the Ensemble technique is higher, and it requires a longer training time than other machine learning algorithms.However, it is noteworthy that the number of learners is not contingent upon the number of features, and the optimizer is capable of identifying the optimal number and size that balances accuracy and speed.The results show that the KNN, SVM, and Ensemble have similar levels of accuracy, as shown in Table 4.The classifier performs better when reducing the number of neighbors.In the case of a dataset with n data points, the KNN classifier utilizes the majority voting scheme to classify new points.On the other hand, only the nearest data points are selected in case the number of neighbors is limited.Increasing the number of data points could result in a decrease in the overall performance due to the increased probability of the inclusion of a data point from other classes.

Conclusions
This study utilized a public online database of PPG signals (PhysioNet Challenge 2015) to classify different cardiac arrhythmias, such as bradycardia, tachycardia, ventricular tachycardia, and ventricular flutter/fibrillation.The PPG features were divided into three sets namely, training, validation, and testing.All PPG signals were preprocessed and features were extracted to be considered as input vectors for the classification model.Four machine learning techniques, namely decision tree (DT), k-nearest neighbors (KNN), support vector machine (SVM), and ensemble, were utilized in the classification process.The results demonstrated a high level of accuracy in classifying different arrhythmias, with an overall accuracy of 98.4% and sensitivities of 98.3%, 95%, 96.8%, and 99.7% for Bradycardia, Tachycardia, Ventricular Flutter/Fibrillation, and Ventricular Tachycardia, respectively.Furthermore, this study demonstrated that high accuracy in classifying cardiac arrhythmias could be achieved using only one type of bio-signal, specifically the PPG signal, and a limited number of features.This represents a significant advancement in the development of a monitoring system for detecting and classifying various cardiac arrhythmias.

Figure 1 .
Figure 1.An example of raw PPG signal.

Figure 2 .
Figure 2. A block diagram outlining the pre-processing procedure for PPG signals.

Figure 3 .
Figure 3. Example for (a) Valleys' detection, and (b) one segment of the PPG signal.

Figure 4 .
Figure 4.The principles of the morphological features of the PPG signal.

Figure 5 .
Figure 5.The classification accuracy for different techniques with different PCA components.

Figure 6 .
Figure 6.The accuracy of different classification techniques with the number of features.

Figure 7 .
Figure 7.The classification sensitivity of different cardiac arrhythmias; namely AF, Bradycardia, Normal, and Tachycardia by the utilization of different machine learning techniques (a) DT, (b) SVM, (c) KNN, and (d) Ensemble.

Table 1 .
Example of some PPG features utilized in the current study.The features' definitions are shown in Fig. 4.

Table 2 .
Cardiac arrhythmias classification results using 41 PPG features.

Table 3 .
The classification results of cardiac arrhythmias of the current study using the Physio Net Challenge 2015 database compared with the literature.*Using the Ensemble technique.

Table 4 .
Classification results obtained using the selected features.